Noise adaptive speech recognition based on sequential noise parameter estimation
Identifieur interne : 00AB73 ( Main/Exploration ); précédent : 00AB72; suivant : 00AB74Noise adaptive speech recognition based on sequential noise parameter estimation
Auteurs : KAISHENG YAO [Japon] ; Kuldip K. Paliwal [Japon, Australie] ; Satoshi Nakamura [Japon]Source :
- Speech communication [ 0167-6393 ] ; 2004.
Descripteurs français
- Pascal (Inist)
English descriptors
- KwdEn :
Abstract
In this paper, a noise adaptive speech recognition approach is proposed for recognizing speech which is corrupted by additive non-stationary background noise. The approach sequentially estimates noise parameters, through which a non-linear parametric function adapts mean vectors of acoustic models. In the estimation process, posterior probability of state sequence given observation sequence and the previously estimated noise parameter sequence is approximated by the normalized joint likelihood of active partial paths and observation sequence given the previously estimated noise parameter sequence. The Viterbi process provides the normalized joint-likelihood. The acoustic models are not required to be trained from clean speech and they can be trained from noisy speech. The approach can be applied to perform continuous speech recognition in presence of non-stationary noise. Experiments conducted on speech contaminated by simulated and real non-stationary noise show that when acoustic models are trained from clean speech, the noise adaptive speech recognition system provides improvements in word accuracy as compared to the normal noise compensation system (which assumes the noise to be stationary) in slowly time-varying noise. When the acoustic models are trained from noisy speech, the noise adaptive speech recognition system is found to be helpful to get improved performance in slowly time-varying noise over a system employing multi-conditional training.
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream PascalFrancis, to step Corpus: 004F03
- to stream PascalFrancis, to step Curation: 001211
- to stream PascalFrancis, to step Checkpoint: 004A34
- to stream Main, to step Merge: 00B861
- to stream Main, to step Curation: 00AB73
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Noise adaptive speech recognition based on sequential noise parameter estimation</title>
<author><name sortKey="Kaisheng Yao" sort="Kaisheng Yao" uniqKey="Kaisheng Yao" last="Kaisheng Yao">KAISHENG YAO</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>ATR Spoken Language Translation Research Labs</s1>
<s2>Kyoto</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>ATR Spoken Language Translation Research Labs</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Paliwal, Kuldip K" sort="Paliwal, Kuldip K" uniqKey="Paliwal K" first="Kuldip K." last="Paliwal">Kuldip K. Paliwal</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>ATR Spoken Language Translation Research Labs</s1>
<s2>Kyoto</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>ATR Spoken Language Translation Research Labs</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>School of Microelectronic Engineering, Griffith University</s1>
<s2>Brisbane</s2>
<s3>AUS</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Australie</country>
<wicri:noRegion>Brisbane</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Nakamura, Satoshi" sort="Nakamura, Satoshi" uniqKey="Nakamura S" first="Satoshi" last="Nakamura">Satoshi Nakamura</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>ATR Spoken Language Translation Research Labs</s1>
<s2>Kyoto</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>ATR Spoken Language Translation Research Labs</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">04-0276268</idno>
<date when="2004">2004</date>
<idno type="stanalyst">PASCAL 04-0276268 INIST</idno>
<idno type="RBID">Pascal:04-0276268</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">004F03</idno>
<idno type="wicri:Area/PascalFrancis/Curation">001211</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">004A34</idno>
<idno type="wicri:explorRef" wicri:stream="PascalFrancis" wicri:step="Checkpoint">004A34</idno>
<idno type="wicri:doubleKey">0167-6393:2004:Kaisheng Yao:noise:adaptive:speech</idno>
<idno type="wicri:Area/Main/Merge">00B861</idno>
<idno type="wicri:Area/Main/Curation">00AB73</idno>
<idno type="wicri:Area/Main/Exploration">00AB73</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Noise adaptive speech recognition based on sequential noise parameter estimation</title>
<author><name sortKey="Kaisheng Yao" sort="Kaisheng Yao" uniqKey="Kaisheng Yao" last="Kaisheng Yao">KAISHENG YAO</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>ATR Spoken Language Translation Research Labs</s1>
<s2>Kyoto</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>ATR Spoken Language Translation Research Labs</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Paliwal, Kuldip K" sort="Paliwal, Kuldip K" uniqKey="Paliwal K" first="Kuldip K." last="Paliwal">Kuldip K. Paliwal</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>ATR Spoken Language Translation Research Labs</s1>
<s2>Kyoto</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>ATR Spoken Language Translation Research Labs</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><inist:fA14 i1="02"><s1>School of Microelectronic Engineering, Griffith University</s1>
<s2>Brisbane</s2>
<s3>AUS</s3>
<sZ>2 aut.</sZ>
</inist:fA14>
<country>Australie</country>
<wicri:noRegion>Brisbane</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Nakamura, Satoshi" sort="Nakamura, Satoshi" uniqKey="Nakamura S" first="Satoshi" last="Nakamura">Satoshi Nakamura</name>
<affiliation wicri:level="1"><inist:fA14 i1="01"><s1>ATR Spoken Language Translation Research Labs</s1>
<s2>Kyoto</s2>
<s3>JPN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Japon</country>
<wicri:noRegion>ATR Spoken Language Translation Research Labs</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">Speech communication</title>
<title level="j" type="abbreviated">Speech commun.</title>
<idno type="ISSN">0167-6393</idno>
<imprint><date when="2004">2004</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">Speech communication</title>
<title level="j" type="abbreviated">Speech commun.</title>
<idno type="ISSN">0167-6393</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Additive noise</term>
<term>EM algorithm</term>
<term>Noise reduction</term>
<term>Non stationary process</term>
<term>Parameter estimation</term>
<term>Sequential estimation</term>
<term>Speech recognition</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Reconnaissance parole</term>
<term>Estimation paramètre</term>
<term>Estimation séquentielle</term>
<term>Réduction bruit</term>
<term>Algorithme EM</term>
<term>Processus non stationnaire</term>
<term>Bruit additif</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">In this paper, a noise adaptive speech recognition approach is proposed for recognizing speech which is corrupted by additive non-stationary background noise. The approach sequentially estimates noise parameters, through which a non-linear parametric function adapts mean vectors of acoustic models. In the estimation process, posterior probability of state sequence given observation sequence and the previously estimated noise parameter sequence is approximated by the normalized joint likelihood of active partial paths and observation sequence given the previously estimated noise parameter sequence. The Viterbi process provides the normalized joint-likelihood. The acoustic models are not required to be trained from clean speech and they can be trained from noisy speech. The approach can be applied to perform continuous speech recognition in presence of non-stationary noise. Experiments conducted on speech contaminated by simulated and real non-stationary noise show that when acoustic models are trained from clean speech, the noise adaptive speech recognition system provides improvements in word accuracy as compared to the normal noise compensation system (which assumes the noise to be stationary) in slowly time-varying noise. When the acoustic models are trained from noisy speech, the noise adaptive speech recognition system is found to be helpful to get improved performance in slowly time-varying noise over a system employing multi-conditional training.</div>
</front>
</TEI>
<affiliations><list><country><li>Australie</li>
<li>Japon</li>
</country>
</list>
<tree><country name="Japon"><noRegion><name sortKey="Kaisheng Yao" sort="Kaisheng Yao" uniqKey="Kaisheng Yao" last="Kaisheng Yao">KAISHENG YAO</name>
</noRegion>
<name sortKey="Nakamura, Satoshi" sort="Nakamura, Satoshi" uniqKey="Nakamura S" first="Satoshi" last="Nakamura">Satoshi Nakamura</name>
<name sortKey="Paliwal, Kuldip K" sort="Paliwal, Kuldip K" uniqKey="Paliwal K" first="Kuldip K." last="Paliwal">Kuldip K. Paliwal</name>
</country>
<country name="Australie"><noRegion><name sortKey="Paliwal, Kuldip K" sort="Paliwal, Kuldip K" uniqKey="Paliwal K" first="Kuldip K." last="Paliwal">Kuldip K. Paliwal</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Asie/explor/AustralieFrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 00AB73 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 00AB73 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Asie |area= AustralieFrV1 |flux= Main |étape= Exploration |type= RBID |clé= Pascal:04-0276268 |texte= Noise adaptive speech recognition based on sequential noise parameter estimation }}
This area was generated with Dilib version V0.6.33. |